\(H_0\): The null hypothesis, no effect
\(H_1\): The alternative hypothesis, there is an effect
We run a test, we get a p-value. What is it?
| Frequentist Statistics | Bayesian Statistics |
|---|---|
| 1. Probability is defined as the long-run frequency of events | 1. Probability represents a degree of belief or certainty about an event |
| 2. Parameters (like the “true value”) are fixed but unknown quantities. | 2. Parameters are treated as random variables with their own probability distributions. |
| 3. Asking about the probability of a hypothesis does not make sense | 3. Asking about the probability of a hypothesis is the main goal |
P-values are the language of science, whether we like them (we don’t) or not.
Tip
You have to understand p-values and their limits to talk to other scientists!
Say, we want to compare two groups with a standard \(t-test\), nothing fancy. Our ability to detect the differences (the statistical power) depends on the sample size and the effect size1.
The \(y\) axis on this plot shows how the power of the test – meaning how often, assuming that the groups really differ by \(d\) on average, you will be able to detect the difference using a t-test.
What about the following setup:
This is a 2x2 design, and we need to consider the interaction term.
That is not even the worse thing.
Simple calculations show that assuming
then 36% of your “significant” results are false positives1!
(Plus, you failed to detect 20% of the real differences)
flowchart LR
A(Program + Text) -->|knitr| B(Text with\nanalysis results)
B --> C[LaTeX]
C --> CC[PDF]
B --> D[Word]
B --> E[HTML]
B --> F[Presentation]
B --> G[Book]
This can be Rmarkdown, Quarto, Jupyter… the goal is that your code and your text are in one place, and the results of your calculations are entered automatically into the text.
In systems such ar R markdown, you can put directly your analysis results in your text. For example, when I write that the \(p\)-value is equal to 0.05, I am writing this:
In systems such ar R markdown, you can put directly your
analysis results in your text. For example, when I write that the
$p$-value is equal to `r p`, I am writing this:The \(p\)-value above is not entered manually (as 0.05), but is the result of a statistical computation. If the data changes, if your analysis changes, the \(p\)-value above will automatically change as well.
flowchart LR
A(Excel) --> B(Data import)
AA(CSV, TSV) --> B(Data import)
AAA(fastq, ...) --> B(Data import)
B --> C[Data\ncleanup]
C --> D[Long term storage]
C --> E[Analysis]
E --> D
E --> F(Figures)
E --> G(Manuscript\nfragments)
E --> H(Tables\nExcel files)
F --> I[You]
G --> I
H --> I
I --> E
In the diagram above, two things take usually the most hands-on time:
MARCH1 are converted to datesThree reasons why you should follow these rules:
Never encode information as formatting, always use explicit columns
Color / font size / font style cannot be read automatically
Make a separate column for comments
Otherwise the values might be lost1
Make a separate excel sheet for column meta information
(for your reference)
Core Unit for Bioinformatics, BIH@Charite